Sketching Aggregates over Probabilistic Streams

نویسنده

  • Erik Vee
چکیده

The datastream model of computation has proven a valuable tool in developing algorithms for processing large amounts of data in small space. This survey examines an extension of this model that deals with uncertain data, called the probabilistic stream model. As in the standard setting, we are presented with a stream of items, with no random access to the data. However, each item is represented by a probability distribution function, allowing us to model the uncertainty associated with each element. We examine the computation of several aggregates in the probabilistic stream setting, including the frequency moments of the stream, average, minimum, and quantiles. The key difficulty in these computations is the fact that the stream represents an exponential number of possible worlds, and even simple numbers like the length of the stream can be different in different possible worlds. Obtaining accurate, reliable estimates can be very non-trivial.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating Aggregate Properties on Probabilistic Streams

The probabilistic-stream model was introduced by Jayram et al. [16]. It is a generalization of the data stream model that is suited to handling \probabilistic" data where each item of the stream represents a probability distribution over a set of possible events. Therefore, a probabilistic stream determines a distribution over potentially a very large number of classical \deterministic" streams...

متن کامل

Sketching Streams Through the Net: Distributed Approximate Query Tracking

Emerging large-scale monitoring applications require continuous tracking of complex dataanalysis queries over collections of physicallydistributed streams. Effective solutions have to be simultaneously space/time efficient (at each remote monitor site), communication efficient (across the underlying communication network), and provide continuous, guaranteed-quality approximate query answers. In...

متن کامل

Sketch-based Querying of Distributed Sliding-Window Data Streams

While traditional data-management systems focus on evaluating single, adhoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is widely distributed and constantly updated. Furthermore, such query answers often need to discount data that is “stale”, and operate solely on a sliding window of...

متن کامل

POISketch: Semantic Place Labeling over User Activity Streams

Capturing place semantics is critical for enabling location-based applications. Techniques for assigning semantic labels (e.g., “bar” or “office”) to unlabeled places mainly resort to mining user activity logs by exploiting visiting patterns. However, existing approaches focus on inferring place labels with a static user activity dataset, and ignore the visiting pattern dynamics in user activit...

متن کامل

A Simple and Efficient Estimation Method for Stream Expression Cardinalities

Estimating the cardinality (i.e. number of distinct elements) of an arbitrary set expression defined over multiple distributed streams is one of the most fundamental queries of interest. Earlier methods based on probabilistic sketches have focused mostly on the sketching algorithms. However, the estimators do not fully utilize the information in the sketches and thus are not statistically effic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008